Chester
Rethinking Retrieval-Augmented Generation for Medicine: A Large-Scale, Systematic Expert Evaluation and Practical Insights
Kim, Hyunjae, Sohn, Jiwoong, Gilson, Aidan, Cochran-Caggiano, Nicholas, Applebaum, Serina, Jin, Heeju, Park, Seihee, Park, Yujin, Park, Jiyeong, Choi, Seoyoung, Contreras, Brittany Alexandra Herrera, Huang, Thomas, Yun, Jaehoon, Wei, Ethan F., Jiang, Roy, Colucci, Leah, Lai, Eric, Dave, Amisha, Guo, Tuo, Singer, Maxwell B., Koo, Yonghoe, Adelman, Ron A., Zou, James, Taylor, Andrew, Cohan, Arman, Xu, Hua, Chen, Qingyu
Large language models (LLMs) are transforming the landscape of medicine, yet two fundamental challenges persist: keeping up with rapidly evolving medical knowledge and providing verifiable, evidence-grounded reasoning. Retrieval-augmented generation (RAG) has been widely adopted to address these limitations by supplementing model outputs with retrieved evidence. However, whether RAG reliably achieves these goals remains unclear. Here, we present the most comprehensive expert evaluation of RAG in medicine to date. Eighteen medical experts contributed a total of 80,502 annotations, assessing 800 model outputs generated by GPT-4o and Llama-3.1-8B across 200 real-world patient and USMLE-style queries. We systematically decomposed the RAG pipeline into three components: (i) evidence retrieval (relevance of retrieved passages), (ii) evidence selection (accuracy of evidence usage), and (iii) response generation (factuality and completeness of outputs). Contrary to expectation, standard RAG often degraded performance: only 22% of top-16 passages were relevant, evidence selection remained weak (precision 41-43%, recall 27-49%), and factuality and completeness dropped by up to 6% and 5%, respectively, compared with non-RAG variants. Retrieval and evidence selection remain key failure points for the model, contributing to the overall performance drop. We further show that simple yet effective strategies, including evidence filtering and query reformulation, substantially mitigate these issues, improving performance on MedMCQA and MedXpertQA by up to 12% and 8.2%, respectively. These findings call for re-examining RAG's role in medicine and highlight the importance of stage-aware evaluation and deliberate system design for reliable medical LLM applications.
- Europe > Austria > Vienna (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Evaluating the Limitations of Local LLMs in Solving Complex Programming Challenges
Matotek, Kadin, Cassel, Heather, Amiruzzaman, Md, Ngo, Linh B.
This study examines the performance of today's open-source, locally hosted large-language models (LLMs) in handling complex competitive programming tasks with extended problem descriptions and contexts. Building on the original Framework for AI-driven Code Generation Evaluation (FACE), the authors retrofit the pipeline to work entirely offline through the Ollama runtime, collapsing FACE's sprawling per-problem directory tree into a handful of consolidated JSON files, and adding robust checkpointing so multi-day runs can resume after failures. The enhanced framework generates, submits, and records solutions for the full Kattis corpus of 3,589 problems across eight code-oriented models ranging from 6.7-9 billion parameters. The submission results show that the overall pass@1 accuracy is modest for the local models, with the best models performing at approximately half the acceptance rate of the proprietary models, Gemini 1.5 and ChatGPT-4. These findings expose a persistent gap between private, cost-controlled LLM deployments and state-of-the-art proprietary services, yet also highlight the rapid progress of open models and the practical benefits of an evaluation workflow that organizations can replicate on in-house hardware.
- North America > United States > Pennsylvania > Delaware County > Chester (0.04)
- North America > United States > Pennsylvania > Chester County > West Chester (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Neural network interpretability with layer-wise relevance propagation: novel techniques for neuron selection and visualization
Bhati, Deepshikha, Neha, Fnu, Amiruzzaman, Md, Guercio, Angela, Shukla, Deepak Kumar, Ward, Ben
Interpreting complex neural networks is crucial for understanding their decision-making processes, particularly in applications where transparency and accountability are essential. This proposed method addresses this need by focusing on layer-wise Relevance Propagation (LRP), a technique used in explainable artificial intelligence (XAI) to attribute neural network outputs to input features through backpropagated relevance scores. Existing LRP methods often struggle with precision in evaluating individual neuron contributions. To overcome this limitation, we present a novel approach that improves the parsing of selected neurons during LRP backward propagation, using the Visual Geometry Group 16 (VGG16) architecture as a case study. Our method creates neural network graphs to highlight critical paths and visualizes these paths with heatmaps, optimizing neuron selection through accuracy metrics like Mean Squared Error (MSE) and Symmetric Mean Absolute Percentage Error (SMAPE). Additionally, we utilize a deconvolutional visualization technique to reconstruct feature maps, offering a comprehensive view of the network's inner workings. Extensive experiments demonstrate that our approach enhances interpretability and supports the development of more transparent artificial intelligence (AI) systems for computer vision applications. This advancement has the potential to improve the trustworthiness of AI models in real-world machine vision applications, thereby increasing their reliability and effectiveness.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Pennsylvania > Delaware County > Chester (0.04)
- North America > United States > Pennsylvania > Chester County > West Chester (0.04)
- (3 more...)
A Tiered GAN Approach for Monet-Style Image Generation
Neha, FNU, Bhati, Deepshikha, Shukla, Deepak Kumar, Amiruzzaman, Md
Generative Adversarial Networks (GANs) have proven to be a powerful tool in generating artistic images, capable of mimicking the styles of renowned painters, such as Claude Monet. This paper introduces a tiered GAN model to progressively refine image quality through a multi-stage process, enhancing the generated images at each step. The model transforms random noise into detailed artistic representations, addressing common challenges such as instability in training, mode collapse, and output quality. This approach combines downsampling and convolutional techniques, enabling the generation of high-quality Monet-style artwork while optimizing computational efficiency. Experimental results demonstrate the architecture's ability to produce foundational artistic structures, though further refinements are necessary for achieving higher levels of realism and fidelity to Monet's style. Future work focuses on improving training methodologies and model complexity to bridge the gap between generated and true artistic images. Additionally, the limitations of traditional GANs in artistic generation are analyzed, and strategies to overcome these shortcomings are proposed.
- North America > United States > Ohio > Portage County > Kent (0.04)
- North America > United States > Pennsylvania > Delaware County > Chester (0.04)
- North America > United States > Pennsylvania > Chester County > West Chester (0.04)
- North America > United States > New Jersey > Essex County > Newark (0.04)
From classical techniques to convolution-based models: A review of object detection algorithms
Neha, Fnu, Bhati, Deepshikha, Shukla, Deepak Kumar, Amiruzzaman, Md
Object detection is a fundamental task in computer vision and image understanding, with the goal of identifying and localizing objects of interest within an image while assigning them corresponding class labels. Traditional methods, which relied on handcrafted features and shallow models, struggled with complex visual data and showed limited performance. These methods combined low-level features with contextual information and lacked the ability to capture high-level semantics. Deep learning, especially Convolutional Neural Networks (CNNs), addressed these limitations by automatically learning rich, hierarchical features directly from data. These features include both semantic and high-level representations essential for accurate object detection. This paper reviews object detection frameworks, starting with classical computer vision methods. We categorize object detection approaches into two groups: (1) classical computer vision techniques and (2) CNN-based detectors. We compare major CNN models, discussing their strengths and limitations. In conclusion, this review highlights the significant advancements in object detection through deep learning and identifies key areas for further research to improve performance.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Ohio > Portage County > Kent (0.04)
- North America > United States > Pennsylvania > Delaware County > Chester (0.04)
- (5 more...)
- Overview (1.00)
- Research Report (0.90)
Classified as unknown: A novel Bayesian neural network
We establish estimations for the parameters of the output distribution for the softmax activation function using the probit function. As an application, we develop a new efficient Bayesian learning algorithm for fully connected neural networks, where training and predictions are performed within the Bayesian inference framework in closed-form. This approach allows sequential learning and requires no computationally expensive gradient calculation and Monte Carlo sampling. Our work generalizes the Bayesian algorithm for a single perceptron for binary classification in \cite{H} to multi-layer perceptrons for multi-class classification.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.75)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Distributed Swarm Intelligence
Kanjula, Karthik Reddy, Kolla, Sai Meghana
This paper presents the development of a distributed application that facilitates the understanding and application of swarm intelligence in solving optimization problems. The platform comprises a search space of customizable random particles, allowing users to tailor the solution to their specific needs. By leveraging the power of Ray distributed computing, the application can support multiple users simultaneously, offering a flexible and scalable solution. The primary objective of this project is to provide a user-friendly platform that enhances the understanding and practical use of swarm intelligence in problem-solving.
- North America > United States > Pennsylvania > Delaware County > Chester (0.04)
- North America > United States > Pennsylvania > Dauphin County > Harrisburg (0.04)
- North America > United States > Pennsylvania > Chester County > West Chester (0.04)
- (2 more...)
Light in the Larynx: a Miniaturized Robotic Optical Fiber for In-office Laser Surgery of the Vocal Folds
Chiluisa, Alex J., Pacheco, Nicholas E., Do, Hoang S., Tougas, Ryan M., Minch, Emily V., Mihaleva, Rositsa, Shen, Yao, Liu, Yuxiang, Carroll, Thomas L., Fichera, Loris
This letter reports the design, construction, and experimental validation of a novel hand-held robot for in-office laser surgery of the vocal folds. In-office endoscopic laser surgery is an emerging trend in Laryngology: It promises to deliver the same patient outcomes of traditional surgical treatment (i.e., in the operating room), at a fraction of the cost. Unfortunately, office procedures can be challenging to perform; the optical fibers used for laser delivery can only emit light forward in a line-of-sight fashion, which severely limits anatomical access. The robot we present in this letter aims to overcome these challenges. The end effector of the robot is a steerable laser fiber, created through the combination of a thin optical fiber (0.225 mm) with a tendon-actuated Nickel-Titanium notched sheath that provides bending. This device can be seamlessly used with most commercially available endoscopes, as it is sufficiently small (1.1 mm) to pass through a working channel. To control the fiber, we propose a compact actuation unit that can be mounted on top of the endoscope handle, so that, during a procedure, the operating physician can operate both the endoscope and the steerable fiber with a single hand. We report simulation and phantom experiments demonstrating that the proposed device substantially enhances surgical access compared to current clinical fibers.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Massachusetts > Worcester County > Worcester (0.04)
- North America > United States > Pennsylvania > Delaware County > Chester (0.04)
- (5 more...)
- Health & Medicine > Surgery (1.00)
- Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.82)
Behavioral-clinical phenotyping with type 2 diabetes self-monitoring data
Levine, Matthew E., Albers, David J., Burgermaster, Marissa, Davidson, Patricia G., Smaldone, Arlene M., Mamykina, Lena
Words: 4252 Keywords: self-monitoring data, type 2 diabetes, machine learning, phenotyping, precision medicine ABSTRACT Objective: To evaluate unsupervised clustering methods for identifying individual-level behavioral-clinical phenotypes that relate personal biomarkers and behavioral traits in type 2 diabetes (T2DM) self-monitoring data. Materials and Methods: We used hierarchical clustering (HC) to identify groups of meals with similar nutrition and glycemic impact for 6 individuals with T2DM who collected self-monitoring data. We evaluated clusters on: 1) correspondence to gold standards generated by certified diabetes educators (CDEs) for 3 participants; 2) face validity, rated by CDEs, and 3) impact on CDEs' ability to identify patterns for another 3 participants. Results: Gold standard (GS) included 9 patterns across 3 participants. Of these, all 9 were rediscovered using HC: 4 GS patterns were consistent with patterns identified by HC (over 50% of meals in a cluster followed the pattern); another 5 were included as subgroups in broader clusers. After reviewing clusters, CDEs identified patterns that were more consistent with data (70% reduction in contradictions between patterns and participants' records). Discussion: Hierarchical clustering of blood glucose and macronutrient consumption appears suitable for discovering behavioral-clinical phenotypes in T2DM. Most clusters corresponded to gold standard and were rated positively by CDEs for face validity. Cluster visualizations helped CDEs identify more robust patterns in nutrition and glycemic impact, creating new possibilities for visual analytic solutions. Conclusion: Machine learning methods can use diabetes self-monitoring data to create personalized behavioral-clinical phenotypes, which may prove useful for delivering personalized medicine.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Pennsylvania > Delaware County > Chester (0.04)
- North America > United States > Pennsylvania > Chester County > West Chester (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
The Third International Conference on Artificial Intelligence and Education
As Soloway attracted over 400 pnrticipants from all of Pittsburgh, Pittsburgh, Penn., described the changes in what he felt over the world who gathered to present 8-10 May 1987. The conference the construction of mechanisms and concerning AI and education This article cochairmen, Stellan Ohlsson and explanations last year to the design of presents a synopsis of the major Jeff Bonar, also gave brief welcomes to artifacts today, he was clearly giving presentations and an overview the participants. With so about transference, leading Soloway many attendees from abroad (The to conclude that transference is not Netherlands, Japan, Canada, West the ultimate goal for teaching and Germany, England, Sweden, France, tutoring programming. Instead, the and Hong Kong were all represented concern should be for the development by speakers), the international flavor of synthesis skills and "highorder of the conference was well established. The obvious disappointment This model does not vary significantly of the audience could be from standard software engineering felt. However, instead of giving the opening address, "Programming requiring these steps be followed in a as Artifact Design." This change strict order, Soloway contends that worked out well because Soloway the way real programmers work best acted like a cheerleader, getting the is to bounce from one stage to another crowd fired up about the subject of AI as the need arises. WINTER 1987 97 Andy di Sessa, in his talk "Social much rigidity has recently been the differences between beginner and Niches for Future Software," focused imposed on programmers by the engi-expert. Finally, Wender suggested that on the need to provide a medium neering approach. He demonstrated him, one could easily mistake him for teacher. Some of the kinds of software he felt should ... He considers current applications to be "the He also suggested that "current programming is to synthesis as a hammer is to a thumb. Each is as likely to challenge to the computer science der was echoed by Ben du Boulay in cause pain as [it is] to get the job community to develop higher-level "What Should a Programming Environment done." The Like?" Bonar's comment in his opening Beyond the usual categories supplied emphasis should be on synthesis welcome that we are "on the verge by the conference structure, several skills for designing, generating, and of a breakthrough" in developing themes linked many of the papers evaluating alternative artifacts that tutoring systems concerned du and presentations.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.24)
- North America > Canada (0.24)
- Europe > United Kingdom > England (0.24)
- (11 more...)